11 research outputs found

    Quasi-stationary distributions as centrality measures of reducible graphs

    Get PDF
    Random walk can be used as a centrality measure of a directed graph. However, if the graph is reducible the random walk will be absorbed in some subset of nodes and will never visit the rest of the graph. In Google PageRank the problem was solved by introduction of uniform random jumps with some probability. Up to the present, there is no clear criterion for the choice this parameter. We propose to use parameter-free centrality measure which is based on the notion of quasi-stationary distribution. Specifically we suggest four quasi-stationary based centrality measures, analyze them and conclude that they produce approximately the same ranking. The new centrality measures can be applied in spam detection to detect ``link farms'' and in image search to find photo albums

    Tensor approach to mixed high-order moments of absorbing Markov chains

    Get PDF
    Moments of absorbing Markov chain are considered. First moments and non-mixed second moments are determined in classical textbooks such as the book of J. Kemeny and J. Snell ``Finite Markov Chains''. The reason is that the first moments and the non-mixed second moments can be easily expressed in a matrix form. Since the representation of mixed moments of higher orders in a matrix form is not straightforward, if ever possible, they were not calculated. The gap is filled by this paper. Tensor approach to the mixed high-order moments is proposed and compact closed-form expressions for the moments are discovered

    Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation

    Get PDF
    We study a problem of quick detection of top-k Personalized PageRank lists. This problem has a number of important applications such as finding local cuts in large graphs, estimation of similarity distance and name disambiguation. In particular, we apply our results to construct efficient algorithms for the person name disambiguation problem. We argue that when finding top-k Personalized PageRank lists two observations are important. Firstly, it is crucial that we detect fast the top-k most important neighbours of a node, while the exact order in the top-k list as well as the exact values of PageRank are by far not so crucial. Secondly, a little number of wrong elements in top-k lists do not really degrade the quality of top-k lists, but it can lead to significant computational saving. Based on these two key observations we propose Monte Carlo methods for fast detection of top-k Personalized PageRank lists. We provide performance evaluation of the proposed methods and supply stopping criteria. Then, we apply the methods to the person name disambiguation problem. The developed algorithm for the person name disambiguation problem has achieved the second place in the WePS 2010 competition

    Pagerank based clustering of hypertext document collections

    No full text
    International audienceClustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hyper-text links. Here we propose a novel PageRank based clustering (PRC) algorithm which uses the hypertext structure. The PRC algorithm produces graph partitioning with high modularity and coverage. The comparison of the PRC algorithm with two content based clustering algorithms shows that there is a good match between PRC clustering and content based clustering
    corecore